Skip to content

DNS servers should have NS and SOA records #8047

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 22 commits into
base: main
Choose a base branch
from
Open

Conversation

iximeow
Copy link
Member

@iximeow iximeow commented Apr 24, 2025

i've marked this "ready", but i'm not merging this as-is - the progenitor patch here is not suitable and i'm going to go bump progenitor like a normal person first, and maybe even not have to touch progenitor versions in this change at all. that might bring the delta back down to below +2000...

aside from Progenitor, this is definitely ready for eyes!


this is probably the more exciting part of the issues outlined in #6944. the changes here get us to the point that for both internal and external DNS, we have:

  • A/AAAA records for the DNS servers in the internal/external group (named ns1.<zone>, ns2.<zone>, ...)
  • NS records for those servers at the zone apex, one for each of the ns*.<zone> described above
  • an SOA record for the zone apex for each of oxide.internal (for internal DNS) and $delegated_domain (for external DNS)
  • the SOA's serial is updated whenever the zone is changed. serial numbers are the DNS config generation, so they start from 1 and tick upward with each change. this is different from most SOA serial schemes (in particular the ones that would use YYYYMMDDNN numbering schemes) but so far as i can tell this is consistent with RFC 1035 requirements.

we do not support zone transfers here. i believe the SOA record here would be reasonable to guide zone transfers if we did, but obviously that's not something i've tested.

SOA fields

the SOA record's RNAME is hardcoded to admin@<zone_name>. this is out of expediency to provide something, but it's probably wrong most of the time. there's no way to get an MX record installed for <zone_name> in the rack's external DNS servers, so barring DNS hijinks in the deployed environment, this will be a dead address. problems here are:

  • we would want to take in an administrative email at rack setup time, so that would be minor plumbing
  • more importantly, what to backfill this with for deployed systems?

it seems like the best answer here is to allow configuration of the rack's delegated domain and zone after initial setup, and being able to update an administrative email would fit in pretty naturally there. but we don't have that right now, so admin@ it is. configuration of external DNS is probably more important in the context of zone transfers and permitting a list of remote addresses to whom we're willing to permit zone transfers. so it feels like this is in the API's future at some point.

bonus

one minorly interesting observation along the way is that external DNS servers in particular are reachable at a few addresses - whichever public address they get in the rack's internal address range, and whichever address they get in the external address range. the public address is what's used for A/AAAA records. so, if you're looking around from inside a DNS zone you can get odd-looking answers like:

# 172.30.1.5 is the internal address that an external DNS server is bound to.
# oxide.test is the delegated domain for this local Omicron deployment.
root@oxz_external_dns_68c5e255:~# dig +short ns2.oxide.test @172.30.1.5
192.168.0.161
root@oxz_external_dns_68c5e255:~# dig +short soa oxide.test @172.30.1.5
ns1.oxide.test. admin.oxide.test. 2 3600 600 18000 150
root@oxz_external_dns_68c5e255:~# dig +short ns oxide.test @172.30.1.5
ns1.oxide.test.
ns2.oxide.test.
# 192.168.0.160 is an external address for this same server.
# there are no records referencing 172.30.1.5 here.
root@oxz_external_dns_68c5e255:~# dig +short ns oxide.test @192.168.0.160
ns1.oxide.test.
ns2.oxide.test.
root@oxz_external_dns_68c5e255:~# dig +short ns1.oxide.test @192.168.0.160
192.168.0.160

@iximeow iximeow added the release notes reminder to include this in the release notes label Apr 24, 2025
@iximeow iximeow force-pushed the ixi/dns-ns-and-soa branch 2 times, most recently from 842455b to f349290 Compare April 25, 2025 21:50
@iximeow iximeow force-pushed the ixi/dns-ns-and-soa branch from f349290 to fa47ab1 Compare April 25, 2025 22:08
Cargo.toml Outdated
Comment on lines 904 to 906
[patch.crates-io]
progenitor = { git = "https://github.com/oxidecomputer/progenitor", rev = "e4af3302c20e35dff6ceafc61e0175739922c132" }
progenitor-client = { git = "https://github.com/oxidecomputer/progenitor", rev = "e4af3302c20e35dff6ceafc61e0175739922c132" }
Copy link
Member Author

@iximeow iximeow Apr 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i'm going to go plumb the Progenitor bump separately, i don't want to make Cargo.toml changes here (moving DNS servers to versioned APIs is noisy enough!)

please disregard Cargo.toml changes from your eyes and i'll promise to not merge this until this (and Cargo.lock) are unchanged 😇

Comment on lines +47 to +54
/// Perform a *lossy* conversion from the V2 [`DnsConfig`] to the V1
/// [`v1::config::DnsConfig`]. In particular, V2 adds NS and SOA records,
/// which did not exist in V1, so they are silently discarded when
/// converting down.
///
/// If this conversion would leave an empty zone, the zone is omitted
/// entirely.
pub fn as_v1(self) -> v1::config::DnsConfig {
Copy link
Member Author

@iximeow iximeow May 1, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so as it turns out, there is one place where we do a get records -> update -> put records pattern, and that is dnsadm. that means that using a dnsadm that uses the V1 API, currently, could talk to a V2 DNS server, get some records, experience this lossy conversion, add a new record, and then put a new set of mangled records back to the server.

in practice i don't think this is an issue: the only records that would be lost are NS and SOA, which are only on the zone apex. since those are also the only records at the apex, any time a V1 client gets records from a V2 server the @ records would be omitted (since the lossy conversion would filter all records), so for this cross-version example to actually modify records the V1 client doesn't know about you'd have to add an A or AAAA or SRV record to @.

in reality, i think that this use of dnsadm is very rare. i've ~never seen it come up! i've used it but not to alter records. so the above seems like low risk, and why i didn't add support for NS and SOA record management to it either.

this is the biggest wrinkle of the DNS API version bump and if there's objections to my approach here, please shout!

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure how hard this would be to add but would it make sense to disallow a PUT with v1 when any v2 features have been used already (i.e., there are SOA or NS records)? I wonder if we even want to disallow GETs. The main reason to support both versions is for the intermediate state but once anything has started using v2 we don't expect to go backwards.

I don't think this is a big deal because this is mainly a development tool and it probably hasn't been used in ages anyway.

Comment on lines +174 to +207
impl From<Srv> for DnsRecord {
fn from(srv: Srv) -> Self {
DnsRecord::Srv(srv)
}
}

#[derive(
Clone,
Debug,
Serialize,
Deserialize,
JsonSchema,
PartialEq,
Eq,
PartialOrd,
Ord,
)]
pub struct Srv {
pub prio: u16,
pub weight: u16,
pub port: u16,
pub target: String,
}

impl From<v1::config::Srv> for Srv {
fn from(other: v1::config::Srv) -> Self {
Srv {
prio: other.prio,
weight: other.weight,
port: other.port,
target: other.target,
}
}
}
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the other option here is to use the v1::config::Srv type directly in v2, because it really has not changed. weaving the V1/V2 types together seems more difficult to think about generally, but i'm very open to the duplication being more confusing if folks feel that way.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would probably use the v1 types directly but I can see going either way.

@@ -3582,7 +3582,7 @@
]
},
"RotImageError": {
"description": "RotImageError\n\n<details><summary>JSON schema</summary>\n\n```json { \"type\": \"string\", \"enum\": [ \"unchecked\", \"first_page_erased\", \"partially_programmed\", \"invalid_length\", \"header_not_programmed\", \"bootloader_too_small\", \"bad_magic\", \"header_image_size\", \"unaligned_length\", \"unsupported_type\", \"reset_vector_not_thumb2\", \"reset_vector\", \"signature\" ] } ``` </details>",
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i'm pretty confused about the backticks showing up in this file!

@iximeow iximeow marked this pull request as ready for review May 1, 2025 21:58
Copy link
Collaborator

@davepacheco davepacheco left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't had a chance to look at this closely.

Comment on lines +64 to +65
.try_into()
.expect("can convert v1 DnsCnofigParams to v2"),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could this not use Into?

Also there's a typo in DnsCnofigParams.

Comment on lines +47 to +54
/// Perform a *lossy* conversion from the V2 [`DnsConfig`] to the V1
/// [`v1::config::DnsConfig`]. In particular, V2 adds NS and SOA records,
/// which did not exist in V1, so they are silently discarded when
/// converting down.
///
/// If this conversion would leave an empty zone, the zone is omitted
/// entirely.
pub fn as_v1(self) -> v1::config::DnsConfig {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure how hard this would be to add but would it make sense to disallow a PUT with v1 when any v2 features have been used already (i.e., there are SOA or NS records)? I wonder if we even want to disallow GETs. The main reason to support both versions is for the intermediate state but once anything has started using v2 we don't expect to go backwards.

I don't think this is a big deal because this is mainly a development tool and it probably hasn't been used in ages anyway.

Comment on lines +174 to +207
impl From<Srv> for DnsRecord {
fn from(srv: Srv) -> Self {
DnsRecord::Srv(srv)
}
}

#[derive(
Clone,
Debug,
Serialize,
Deserialize,
JsonSchema,
PartialEq,
Eq,
PartialOrd,
Ord,
)]
pub struct Srv {
pub prio: u16,
pub weight: u16,
pub port: u16,
pub target: String,
}

impl From<v1::config::Srv> for Srv {
fn from(other: v1::config::Srv) -> Self {
Srv {
prio: other.prio,
weight: other.weight,
port: other.port,
target: other.target,
}
}
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would probably use the v1 types directly but I can see going either way.

@@ -1067,7 +1099,9 @@ mod test {
.map(|record| match record {
DnsRecord::A(v) => IpAddr::V4(*v),
DnsRecord::Aaaa(v) => IpAddr::V6(*v),
DnsRecord::Srv(_) => panic!("unexpected SRV record"),
other => {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit, take it or leave it: I'd explicitly put the three patterns here. That will prompt someone to revisit this if we add future ones.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
release notes reminder to include this in the release notes
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants